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Abstract 

The  view  of  learning  that  underlies  standard  test  theory  is 
inconsistent  with  the  view  rapidly  emerging  from  cognitive  and  educational 
psychology.  Learners  become  more  competent  not  simply  by  learning  more 
facts  and  skills,  but  by  reconfiguring  their  knowledge;  by  “chunking” 
information  to  reduce  memory  loads;  and  by  developing  strategies  and 
models  that  help  them  discern  when  and  how  facts  and  skills  are  important. 
Neither  classical  test  theory  nor  item  response  theory  (IRT)  is  designed  to 
inform  educational  decisions  conceived  from  this  perspective.  This  paper 
sketches  the  outlines  of  a  test  theory  built  around  models  of  student 
understanding,  as  inspired  by  the  substance  and  the  psychology  of  the 
domain  of  interest.  The  ideas  are  illustrated  with  a  simple  numerical 
example  based  on  Siegler’s  balance  beam  tasks.  Directions  in  which  the 
approach  must  be  developed  to  be  broadly  useful  in  educational  practice  are 
discussed. 


Background 


When  schooling  became  mandatory  at  the  turn  of  the  century,  educators  suddenly 
faced  selection  and  placement  decisions  for  unprecedented  numbers  of  students,  displaying 
the  diversity  of  abilities  and  backgrounds  that  individuals  bring  to  schooling  (Glaser, 

1981).  Numbers  of  correct  answers  to  multiple-choice  test  items  were  used  to  rank 
students  according  to  their  overall  proficiencies  in  domains  of  tasks.  These  rankings  were 
used  in  turn  to  predict  students’  success  in  fixed  educational  experiences. 

Classical  test  theory  (CTT)  emerged  when  Spearman  (e.g.,  1907)  applied  statistical 
methods  to  study  how  reliable  estimates  of  this  overall  proficiency  would  be  from  different 
test  forms  that  might  be  constructed  for  the  purpose.  Extensions  of  this  work  led  over  the 
years  to  a  vast  armamentarium  of  techniques  for  building  tests  and  making  decisions  with 
test  scores  (Gulliksen,  1950);  to  an  axiomatic  foundation  for  statistical  inference  about  test 
scores  (Lord,  1959;  Lord  &  Novick,  1968;  Novick,  1966);  and  to  sophisticated  techniques 
for  partitioning  test  score  variance  according  to  facets  of  items,  persons,  and  observational 
settings  (Cronbach,  Gleser,  Nanda,  &  Rajaratnam,1972).  It  is  important  to  note  that  in  all 
this  work,  the  object  of  inference  is  overall  proficiency — the  test  score,  observed  or 
expected —  in  terms  of  numbers  of  correct  responses  in  a  domain  of  items. 

Item  response  theory  (IRT;  see  Hambleton,  1989,  for  an  overview)  represented  a 
major  practical  advance  over  CTT  by  modeling  probabilities  of  correct  item  response  in 
terms  of  an  unobservable  proficiency  variable.  IRT  solves  many  problems  that  were 
difficult  under  CTT,  in  equating,  test  construction,  and  adaptive  testing.  Advanced 
statistical  methods  have  been  brought  to  bear  on  inferential  problems  in  IRT,  including 
sophisticated  estimation  algorithms  (e.g.,  Bock  &  Aitkin,  1981),  techniques  from  missing- 
data  theory  (Mislevy,  in  press-a),  and  Bayesian  treatments  of  uncertainty  in  models  and 
parameters  (Lewis,  1985;  Mislevy  &  Sheehan,  1990;  Tsutakawa  &  Johnson,  1988).  The 
underlying  psychological  model  remains  quite  simple,  however,  as  in  CTT,  the  focus 
remains  on  overall  proficiency  in  a  domain  of  items.  From  the  perspective  of  IRT,  two 
students  with  the  same  overall  proficiency  are  indistinguishable. 

As  useful  as  standard  tests  and  standard  test  theory  have  proven  in  large-scale 
evaluation,  selection,  and  placement  problems,  their  focus  on  who  is  competent  and  how 
many  items  they  answer  can  fall  short  when  the  goal  is  to  improve  individuals’ 
competencies.  Glaser,  Lesgold,  and  Lajoie  (1987)  point  out  that  tests  can  predict  failure 
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without  an  understanding  of  what  causes  success,  but  intervening  to  prevent  failure  and 
enhance  competence  requires  deeper  understanding. 

The  past  decade  has  witnessed  considerable  progress  toward  the  requisite 
understanding.  Psychological  research  has  moved  away  from  the  traditional  laboratory 
studies  of  simple  (even  random!)  tasks,  to  tasks  that  better  approximate  the  meaningful 
learning  and  problem-solving  activities  that  engage  people  in  real  life.  Studies  comparing 
the  ways  experts  differ  from  novices  in  applied  problem-solving  in  domains  such  as 
physics  and  trouble-shooting  (e.g.,  Chi,  Feltovich  &  Glaser,  1981)  reveal  the  central 
importance  of  knowledge  structures — networks  of  concepts  and  interconnections  among 
them — that  impart  meaning  to  patterns  in  what  one  observes  and  how  one  chooses  to  act. 
The  process  of  learning  is  to  a  large  degree  expanding  these  structures  and,  importantly, 
reconfiguring  them  to  incorporate  new  and  qualitatively  different  connections  as  the  level  of 
understanding  deepens.  Educational  psychologists  have  begun  to  put  these  findings  to 
work  in  designing  both  instruction  and  tests  (e.g.,  Glaser  et  al.,  1987;  Greeno,  1976; 
Marshall,  1985,  in  press).  Again  in  the  words  of  Glaser,  Lesgold,  and  Lajoie  (1987), 

“Achievement  testing  as  we  have  defined  it  is  a  method  of  indexing  stages  of 
competence  through  indicators  of  the  level  of  development  of  knowledge, 
skill,  and  cognitive  process.  These  indicators  display  stages  of  performance 
that  have  been  attained  and  on  which  further  learning  can  proceed.  They 
also  show  forms  of  error  and  misconceptions  in  knowledge  that  result  in 
inefficient  and  incomplete  knowledge  and  skill,  and  that  need  instructional 
attention.”  (p.81) 

Paraphrasing  Ohlsson  and  Langley  (1985),  Clancey  (1986)  summarizes  the  shift  in 
perspective:  “[to]  describing  mental  processes,  rather  than  quantifying  performance  with 
respect  to  stimulus  variables;  describing  individuals  in  detail,  not  just  stating  generalities; 
and  giving  psychological  interpretation  to  qualitative  data,  rather  than  statistical  treatment  to 
numerical  measurements”  (p.  391). 

An  Approach  to  Modeling  Student  Understanding 

The  modeling  approach  we  are  beginning  to  pursue  can  be  encapsulated  as  follows: 

“Standard  test  theory  evolved  as  the  application  of  statistical  theory  with  a 
simple  model  of  ability  that  suited  the  decision-making  environment  of  mass 
educational  systems.  Broader  educational  options,  based  on  insights  into 
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the  nature  of  learning  and  supported  by  rnore  powerful  technologies, 
demand  a  broader  range  of  models  of  capabilities — stilTsimple  compared  to 
the  realities  of  cognition,  but  capturing  patterns  that  inform  a  broader  range 
of  instructional  alternatives.  A  new  test  theory  can  be  brought  about  by 
applying  to  well-chosen  cognitive  models  the  same  general  principles  of 
statistical  inference  that  led  to  standard  test  theory  when  applied  to  the 
simple  model.”  (Mislevy,  in  press-b). 

The  approach  begins  in  a  specific  application  by  defining  a  universe  of  student 
models.  This  “supermodel”  is  indexed  by  parameters  that  signify  distinctions  between 
states  of  understanding.  Symbolically,  we  shall  refer  to  the  (typically  vector-valued) 
parameter  of  the  student-model  as  rj.  A  particular  set  of  values  of  T|  specifies  a  particular 
student  model,  or  one  particular  state  among  the  universe  of  possible  states  the  supermodel 
can  accommodate.  These  parameters  can  be  qualitative  or  quantitative,  and  qualitative 
parameters  can  be  unordered,  partially  ordered,  or  completely  ordered.  A  supermodel  can 
contain  any  mixture  of  these  types.  Their  nature  is  derived  from  the  structure  and  the 
psychology  of  the  learning  area,  the  idea  being  to  capture  the  essential  distinctions  among 
students. 

Any  application  faces  a  modeling  problem,  an  item  construction  problem,  and  an 
inference  problem. 

The  modeling  problem  is  delineating  the  states  or  levels  of  understanding  in  a 
learning  domain.  In  meaningful  applications  this  might  address  several  distinct  strands  of 
learning,  as  understanding  develops  in  a  number  of  key  concepts,  and  it  might  address  the 
connectivity  among  those  concepts.1  Symbolically,  this  substep  defines  the  structure  of 
p(xlri),  where  x  represents  observations.  Obviously  any  model  will  be  a  gross 
simplification  of  the  reality  of  cognition.  A  first  consideration  in  what  to  include  in  the 
supermodel  is  the  substance  and  the  psychology  of  the  domain:  Just  what  are  the  key 


1  A  particularly  interesting  special  case  occurs  when  the  universe  of  student  models  can  be  expressed  a 
performance  models  (Clancey,  1986).  A  performance  model  consists  a  knowledge  base  and  manipulation 
rules  that  can  be  run  on  problems  in  a  domain  of  interest.  A  particular  model  can  contain  both  knowledge 
and  production  rules  that  are  incorrect  or  incomplete;  the  solutions  it  produces  will  be  correct  or  incorrect  in 
identifiable  ways.  Here  the  parameter  T|  specifies  features  of  performance  models. 
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concepts?  What  are  important  ways  of  understanding  and  misunderstanding  them?  What 
are  typical  paths  to  competence?  A  second  consideration  is  the  so-called  grain-size 
problem,  or  the  level  of  detail  at  which  student-models  should  differ.  A  major  factor  in 
answering  this  question  is  the  decision-making  framework  under  which  the  modeling  will 
take  place.  As  Greeno  (1976)  points  out,  “It  may  not  be  critical  to  distinguish  between 
models  differing  in  processing  details  if  the  details  lack  important  implications  for  quality  of 
student  performance  in  instructional  situations,  or  the  ability  of  students  to  progress  to 
further  stages  of  knowledge  and  understanding.” 

The  item  construction  problem  is  devising  situations  for  which  students  who  differ 
in  the  parameter  space  are  likely  to  behave  in  observably  different  ways.  The  conditional 
probabilities  of  behavior  of  different  types  given  the  unobservable  state  of  the  student  are 
the  values  of  p(xlrj),  which  may  in  turn  be  modeled  in  terms  of  another  set  of  parameters, 
say  p.  The  p(xlrj)  values  provide  the  basis  for  inferring  back  about  the  student  state.  An 
element  in  x  could  contain  a  right  or  wrong  answer  to  a  multiple-choice  test  item,  but  it 
could  instead  be  the  problem-solving  approach  regardless  of  whether  the  answer  is  right  or 
wrong,  the  quickness  of  a  responding,  a  characteristic  of  a  think-aloud  protocol,  or  an 
expert’s  evaluation  of  a  particular  aspect  of  the  performance.  The  effectiveness  of  an  item 
is  reflected  in  differences  in  conditional  probabilities  associated  with  different  parameter 
configurations,  so  an  item  may  be  very  useful  in  distinguishing  among  some  aspects  of 
potential  student  models  but  useless  for  distinguishing  among  others.  Tatsuoka  (1989) 
demonstrates  the  relationship  between  item  construction  and  inference  about  students’ 
strategies  for  subtracting  mixed  numbers. 

The  inference  problem  is  reasoning  from  observations  ::o  student  models.  The 
model-building  and  item  construction  steps  provide  r\  and  p(xlr|).  Let  p(T|)  represent 
expectations  about  rj  in  a  population  of  interest — possibly  non-informative,  possibly  based 
on  expert  opinion  or  previous  analyses.  Bayes  theorem  can  be  employed  to  draw 
inferences  about  tj  given  x  via  pCqlx) «  p(xlri)  p(Tj).  Thus  p(rjlx)  characterizes  belief 
about  a  particular  student’s  model  after  having  observed  a  sample  of  the  student’s  behavior. 
Practical  problems  include  characterizing  what  is  known  about  (3  so  as  to  determine  p(xlri), 
carrying  out  the  computations  involved  in  determining  p(rj!x),  and,  in  some  applications, 
developing  strategies  for  efficient  sequential  gathering  of  observations.  As  we  have  noted, 
analogous  problems  have  been  studied  in  standard  test  theory,  and  the  solutions  there, 
because  they  are  applications  of  general  principles  of  statistical  inference,  generalize  to 
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models  built  around  alternative  psychological  models.  The  models  are  more  realistic  and 
more  ambitious,  but  the  formalism  is  identical.2 

Previous  Research 

Research  relevant  to  this  approach  has  been  carried  out  in  a  wide  variety  of  fields, 
including  cognitive  psychology,  the  psychology  of  mathematics  and  science  education, 
artificial  intelligence  (AI)  work  on  student  modeling,  test  theory,  and  statistical  inference. 
Cognitive  scientists  have  suggested  general  structures  such  as  “frames”  or  “schemas”  that 
can  serve  as  a  basis  for  modeling  understanding  (e.g.,  Minsky,  1975;  Rumelhart,  1980), 
and  have  begun  to  devise  tasks  that  probe  their  features  (e.g.,  Marshall,  1989,  in  press). 
Researchers  interested  in  the  psychology  of  learning  in  subject  areas  such  as  proportional 
reasoning  have  focused  on  identifying  key  concepts,  studying  how  they  are  typically 
acquired  (e.g.,  in  mechanics,  Clement,  1982;  in  ratio  and  proportional  reasoning,  Karplus, 
Pulos,  &  Stage,  1983),  and  constructing  observational  settings  that  allow  one  to  infer 
students’  understanding  (e.g.,  van  den  Heuvel,  1990;  McDermott,  1984).  We  make  no 
effort  here  to  review  these  literatures,  but  point  out  that  our  work  can  succeed  only  by 
building  upon  their  foundations.  Our  potential  contribution  would  be  to  the  structures  and 
mechanics  of  model-building  and  inference.  The  following  sections  briefly  mention  some 
important  work  along  these  lines  from  test  theory  and  statistics. 

Modeling  Student  Behavior 

The  standard  models  of  educational  measurement  are  concerned  solely  with 
examinees’  tendencies  to  answer  items  correctly — that  is,  their  overall  proficiency. 
Recently,  however,  models  that  focus  on  patterns  other  than  overall  proficiency  have  begun 
to  appear  the  test  theory  literature.  Some  examples  that  are  relevant  to  educational 
applications  are  listed  below. 


2  Advocates  of  student  modeling  emphasize  the  qualitative  aspects  of  student  models.  Our  approach  is 
compatible  with  this  view,  as  it  is  possible  to  build  universes  of  qualitative  models,  indexed  by  parameters 
that  distinguish  their  features.  Our  knowledge  about  a  particular  student’s  model  is  imperfect,  however.  It 
can  be  expressed  in  terms  of  probabilities  expressing  the  plausibility  of  various  models,  given  what  has 
been  observed.  Probabilities  are  quantitative,  and  admit  to  a  calculus  of  manipulation.  We  might  thus 
employ  a  quantitative  model  for  our  (imperfect)  knowledge  about  qualitative  student  models. 
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1.  Mislevy  and  Verhelst’s  (1990)  mixture  models  for  item  responses  when  different 
examinees  follow  different  solution  strategies  or  use  alternative  mental  models.  When  a 
single  IRT  model  cannot  capture  key  distinctions  among  examinees,  it  may  suffice  to  posit 
qualitatively  distinct  classes  of  examinees  and  use  IRT  models  to  summarize  distinctions 
among  examinees  within  these  classes. 

2.  Wilson’s  (1989b)  Saltus  model  for  characterizing  stages  of  conceptual  development. 
This  model  parameterizes  the  differential  patterns  of  strength  and  weakness  expected  as 
learners  progress  through  successive  conceptualizations  of  a  domain. 

3.  Falmagne’s  (1989)  and  Haertel’s  (1984)  latent  class  models  for  Binary  Skills. 

These  models  are  intended  for  domains  in  which  competence  can  be  described  by  the 
presence  or  absence  of  several  (possibly  complex)  elements  of  skill  or  knowledge,  and 
observational  situations  can  be  devised  that  demand  various  combinations  of  these  skills. 
Also  see  Paulson  (1986)  for  an  alternative  use  of  latent  class  modelling  in  cognitive 
assessment. 

4 .  Embretson’s  (1985)  multicomponent  models  for  integrating  item  construction  and 
inference  within  a  unified  cognitive  model.  The  conditional  probabilities  of  solution  steps 
given  a  multifaceted  student  model  are  given  by  IRT-like  statistical  structures. 

5 .  Tatsuoka’s  (1989)  Rule  space  analysis.  Tatsuoka  uses  a  generalization  of  IRT 
methodology  to  define  a  metric  for  classifying  examinees  based  on  likely  patterns  of  item 
response  given  patterns  of  knowledge  and  strategies. 

6.  Yamamoto’s  (1987)  Hybrid  model  for  dichotomous  responses.  The  Hybrid  model 
characterizes  an  examinee  as  either  belonging  to  one  of  a  number  of  classes  associated  with 
states  of  understanding,  or  in  a  catch-all  IRT  class.  This  approach  might  be  useful  when 
certain  response  patterns  signal  states  of  understanding  for  which  particular  educational 
experiences  are  known  to  be  effective.  Instructional  decisions  are  triggered  by  these 
patterns  if  they  are  detected,  but  by  overall  proficiency  when  no  more  targeted  action  can  be 
provided. 

7.  Masters  and  Mislevy’s  (in  press)  and  Wilson’s  (1989a)  use  of  me  Partial  Credit 
rating  scale  model  to  characterize  levels  of  understanding,  as  evidenced  by  the  nature  or 
approach  of  a  performance  rather  than  its  correctness.  These  applications  incorporate  into  a 
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probabilistic  framework  the  cognitive  perspective  underlying  Biggs  and  Collis’s  (1982) 
SOLO  taxonomy  for  describing  salient  qualities  of  performances. 

These  are  the  rudiments  of  models  upon  which  concept-referenced  achievement 
measures  can  be  based.  Applications  to  date  have  been  fairly  limited,  and  most  have 
addressed  one-to-many  relationships  between  an  underlying  knowledge  state  and 
observable  behavior.  That  is,  a  single  (possibly  unordered  or  multifaceted)  variable  has 
been  used  to  characterize  examinees,  and  performance  on  all  items  is  modeled  in  terms  of 
this  variable.  What  is  lacking  from  the  point  of  view  of  the  educator  is  the  fact  that 
meaningful  real  world  tasks  are  rarely  segregated  into  these  neat  little  sets.  Rather,  they 
often  involve  multiple  concepts,  connections  among  larger  concepts,  and  transformations 
among  alternative  representations  of  a  domain.  While  the  simple  tasks  that  characterize 
one-to-many  domains  are  essential  at  early  stages  of  learning,  more  complex  tasks  that 
involve  multiple  concepts  in  many-to-many  relationships  are  needed  to  promote  the 
integration  among  concepts  that  form  the  core  of  what  is  often  called  “higher-level 
learning.” 

Inference  Networks 

Recent  developments  in  the  context  of  probability-based  inference  networks 
(Lauritzen  &  Spiegelhalter,  1988;  Pearl,  1988)  offer  a  capability  for  integrating  conceptual 
models  of  the  type  described  above.  These  probability-based  structures  are  attractive  for 
educational  measurement  because  they  permit  a  coherent  extension  of  the  modeling 
approach  and  inferential  logic  of  the  new  cognitive-assessment  models  mentioned  above. 
To  show  how  the  approach  might  be  applied  in  the  educational  setting,  we  first  discuss  an 
application  in  the  setting  of  medical  diagnosis. 

MUNIN  is  an  inference  network  that  organizes  knowledge  in  the  domain  of 
electromyography — the  relationships  among  nerves  and  muscles.  Its  function  is  to 
diagnose  nerve/muscle  disease  states.  The  interested  reader  is  referred  to  Andreassen, 
Woldbye,  Falck,  and  Andersen  (1987)  for  a  fuller  description.  The  prototype  discussed  in 
that  presentation  and  used  for  our  illustration  concerns  a  single  arm  muscle,  with  concepts 
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represented  by  twenty-five  nodes  and  their  interactions  represented  by  causal  links.3  A 
graphic  representation  of  the  network  appears  in  Figure  1. 

[Figure  1  about  here] 

The  rightmost  column  of  nodes  in  Figure  1  concerns  outcomes  of  potentially 
observable  variables,  such  as  symptoms  or  test  results.  These  outcomes  are  the  x  vector  in 
our  earlier  notation.  The  middle  layers  are  “pathophysiological  states,”  or  syndromes. 
These  drive  the  probabilities  of  observations.  The  leftmost  layer  is  the  underlying  disease 
state,  including  three  possible  diseases  in  various  stages,  no  disease,  or  “Other” — a 
condition  not  built  into  in  the  system.  These  states  drive  the  probabilities  of  syndromes.  It 
is  assumed  that  a  patient’s  true  state  can  be  adequately  characterized  by  values  of  these 
disease  and  syndrome  states — our  T]  parameter.  Paths  indicate  conditional  probability 
relationships,  which  are  to  be  determined  either  logically,  subjectively,  purely  empirically, 
or  through  model-based  statistical  estimation.  In  particular,  the  paths  ending  at  observables 
represent  p(xlq).  Note  that  the  probabilities  of  observables  depend  on  some  syndromes, 
but  not  others.  The  lack  of  a  path  signifies  conditional  independence.  Note  also  that  a 
given  test  result  can  be  caused  by  different  disease  combinations. 

As  a  patient  enters  the  clinic,  the  diagnostician’s  state  of  knowledge  about  him  is 
expressed  by  population  base  rates,  or  p(q).  This  is  depicted  in  Figure  1  by  bars  that 
represent  the  base  probabilities  of  disease  and  syndrome  states.  Base  rates  of  observable 
test  results  are  similarly  shown.  Tests  are  carried  out,  one  at  a  time  or  in  clusters,  and  with 
each  result  the  probabilities  of  disease  states  are  updated.  The  expectations  of  tests  not  yet 
given  are  calculated,  and  it  can  be  determined  which  test  will  be  most  informative  in 
identifying  the  disease  state.  Knowledge  is  thus  accumulated  in  stages,  from  p(T|)  to 
pCqlxi)  after  observing  the  first  subset  of  tests,  to  p(r)lxi,X2)  after  the  second,  and  so  on, 
with  each  successive  test  selected  optimally  in  light  of  knowledge  at  that  point  in  time. 
Figure  2  illustrates  the  state  of  knowledge  after  a  number  of  electromyographic  test  results 
have  been  observed.  Observable  nodes  with  results  now  known  are  depicted  with  shaded 
bars  representing  observed  values  For  them,  knowledge  is  perfect.  The  implications  of 
these  results  have  been  propagated  leftward  to  syndromes  and  disease  states,  as  shown  by 


3  The  ESPRIT  team  has  generalized  the  application  to  address  clusters  of  interrelated  muscles  in  a  network 
containing  over  a  thousand  nodes. 
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distributions  that  differ  from  the  base  rates  in  Figure  1.  These  values  guide  the  decision  to 
test  further  or  initiate  a  treatment.  Finally,  updated  beliefs  about  disease  states  have  been 
propagated  back  toward  the  right  to  update  expectations  about  the  likely  outcomes  of  test 
not  yet  administered.  These  expectations,  and  the  potential  they  hold  for  further  updating 
knowledge  about  the  disease  states,  guide  the  selection  of  further  tests. 

[Figure  2  about  here] 

Inference  Networks  in  the  Educational  Setting 

To  see  how  the  ideas  underlying  MUNIN  apply  to  the  educational  setting,  consider 
the  following  analogy: 


Medical  Application 

Observable  symptoms,  medical  tests 

Disease  states,  syndromes 

Architecture  of  interconnections  based 
on  medical  theory 

Conditional  probabilities  given  by 
physiological  models,  empirical  data, 
expert  opinion 


Educational  Application 

Test  items,  verbal  protocols,  teachers’ 
ratings  of  levels  of  understanding, 
solution  traces 

States  or  levels  of  understanding  of 
key  concepts,  available  strategies 

Architecture  of  interconnections  based 
on  cognitive  and  educational  theory 

Conditional  probabilities  given  by 
psychological  models,  empirical  data, 
expert  opinion 


The  definitions  of  key  concepts  will  be  guided  by  theorized  and  observed  stages  of 
learning  in  the  area,  and  the  connections  with  observables  will  be  expressed  through 
measurement  models  such  as  those  discussed  above.  The  initialization  of  the  probabilities 
in  the  network  will  be  accomplished  by  one  or  more  methods:  clinical  analysis,  with  skilled 
interviewers  assessing  in  detail  the  nature  of  students’  understandings  and  related  these 
understandings  to  task  performances,  statistical  analysis  of  data  concerning  selected  models 
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for  portions  of  the  larger  network  (Mislevy  &  Verhelst,  1990);  or  theoretical  analysis,  in 
which  logic  or  theory  provides  expectations  for  outcomes  under  hypothesized  cognitive 
states.  After  the  initialization  phase,  connections  can  be  updated  periodically  with  the  larger 
amounts  of  less  precise  data  that  will  be  accumulated  as  students  provide  information  about 
the  adequacy  of  the  relationships  embodied  in  the  network  and  the  accuracy  of  the  baseline 
and  conditional  probabilities. 


A  Numerical  Example 

Siegler’s  balance  beam  tasks 

Kuhn  (1970)  emphasizes  the  central  role  that  exemplars,  or  small,  archetypical 
examples,  play  in  science.  Textbook  examples  are  the  vehicle  through  which  students  are 
acculturated  to  the  concepts  and  relationships  of  a  particular  way  of  viewing  a  class  of 
phenomena — a  paradigm,  in  Kuhn’s  words.  They  function  almost  like  parables  or 
morality  tales.  New  paradigms  are  introduced  with  new  exemplars,  that  introduce  new 
concepts,  highlight  differences  between  the  new  paradigm  and  the  old,  and  demonstrate 
how  the  new  way  of  thinking  solves  problems  the  old  way  could  not.  Modeling  the  states 
of  the  electron  in  the  hydrogen  atom  possesses  this  status  in  quantum  mechanics. 
Explaining  children’s  understanding  of  balance  beam  problems,  an  exemplar  from 
developmental  psychology  originated  by  Piaget,  is  approaching  the  same  status  in  test 
theory  (e.g.,  Kempf,  1983,  Mislevy,  in  press-b,  and  Wilson,  1989b).  Robert  Siegler’s 
balance  beam  tasks  yield  data  that  are,  on  the  surface,  indistinguishable  from  standard  test 
data,  but  there  are  two  key  distinctions: 

1 .  What  is  important  about  examinees  is  not  their  overall  probability  of  answering 
items  correctly,  but  their  (unobservable)  state  of  understanding  of  the  domain. 

2.  Children  at  less  sophisticated  levels  of  understanding  initially  get  certain  problems 
right  for  the  wrong  reasons.  These  items  are  more  likely  to  be  answered  wrong  at 
intermediate  stages,  as  understanding  deepens!  They  are  bad  items  by  the  standards 
of  classical  test  theory  and  IRT,  because  probabilities  of  correct  response  do  not 
increase  monotonically  with  increasing  total  test  score.  From  the  perspective  of  the 
developmental  theory,  however,  not  only  is  this  reversal  expected,  but  it  plays  an 
important  role  in  distinguishing  among  children  with  different  ways  of  thinking 
about  the  problems. 
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Attempting  to  study  children’s  reasoning  in  a  manner  less  subjective  than  Piaget’s 
unstructured  interviews,  Siegler  (1981)  devised  a  series  of  balance  beam  tasks  like  the  one 
illustrated  in  Figure  3.  Varying  numbers  of  weights  are  placed  at  varying  locations  on  a 
balance  beam.  The  child  predicts  whether  the  beam  will  tip  to  left,  to  the  right,  or  remain  in 
balance.  Piaget’s  analysis  of  children’s  behavior  on  balancing  tasks  (Inhelder  &  Piaget, 
1958),  posits  that  a  child  will  respond  in  accordance  with  his  or  her  stage  of  understanding. 
The  usual  stages  through  which  children  progress  can  be  described  in  terms  of  successive 
acquisition  of  the  rules  listed  below. 

[Figure  3  about  here] 

Rule  I:  If  the  we’shts  on  both  sides  are  equal,  it  will  balance.  If  they  are  not  equal,  the 
side  with  the  heavier  weight  will  go  down.  (Weight  is  the  “dominant  dimension,” 
because  children  are  generally  aware  that  weight  is  important  in  the  problem  earlier 
than  they  realize  that  distance  from  the  fulcrum,  the  “subordinate  dimension,”  also 
matters.) 

Rule  II:  If  the  weights  and  distances  on  both  sides  are  equal,  then  the  beam  will  balance. 

If  the  weights  are  equal  but  the  distances  are  not,  the  side  with  the  longer  distance 
will  go  down.  Otherwise,  the  side  with  the  heavier  weight  will  go  down.  (A  child 
using  this  rule  uses  the  subordinate  dimension  only  when  information  from  the 
dominant  dimension  is  equivocal.) 

Rule  HI:  Same  as  Rule  II,  except  that  if  the  values  of  both  weight  and  length  are  unequal 
on  both  sides,  the  child  will  “muddle  through”  (Siegler,  1981,  p.6).  (A  child  using 
this  rule  now  knows  that  both  dimensions  matter,  but  doesn’t  know  just  how  they 
combine.  Responses  will  be  based  on  a  strategy  such  as  guessing.) 

Rule  IV:  Combine  weights  and  lengths  correctly  (i.e.,  compare  torques,  or  products  of 
weights  and  distances). 

It  was  thus  hypothesized  that  each  child  could  be  classified  into  one  of  five  stages— 
the  four  characterized  by  the  rules,  or  an  earlier  “preoperational”  stage  in  which  neither 
weight  nor  length  are  thought  to  bear  any  systematic  relationship  to  the  action  of  the  beam. 

Siegler  developed  six  types  of  problems  listed  below  to  distinguish  among  children 
at  different  stages  of  reasoning.  (See  Figure  4  for  an  example  of  each.) 


Equal  problems  (E),  with  matching  weights  and  lengths  on  both  sides. 


Dominant  problems  (D),  with  unequal  weights  but  equal  lengths. 

Subordinate  problems  (S),  with  unequal  lengths  but  equal  weights. 

Conflict-dominant  problems  (CD),  in  which  one  side  has  greater  weight,  the  other  has 
greater  length,  and  the  side  with  the  heavier  weight  will  go  down. 

■ConlliCL-SUbordinatS  problems  (CS),  in  which  one  side  has  greater  weight,  the  other  has 
greater  length,  and  the  side  with  the  greater  length  will  go  down. 

Conflict-equal  problems  (CE),  in  which  one  side  has  greater  weight,  the  other  has  greater 
length,  and  the  beam  will  balance. 

[Figure  4  about  here) 

Table  1  shows  the  probabilities  of  correct  response  that  would  be  expected  from 
groups  of  children  in  different  stages,  if  their  responses  were  in  complete  accordance  the 
hypothesized  rules.  Scanning  across  the  rows  reveals  how  the  probability  of  a  correct 
response  to  a  given  type  of  item  does  not  always  increase  as  level  of  understanding 
increases.  For  example,  Stage  II  children  tend  to  answer  CD  items  right  for  the  wrong 
reason,  while  Stage  III  children,  now  aware  of  a  conflict,  flounder. 

[Table  1  about  here] 

A  latent  class  model  for  balance  beam  tasks 

If  the  theory  were  perfect,  the  columns  in  Table  1  would  give  probabilities  of 
correct  response  to  the  various  types  of  items  from  children  at  different  stages  of 
understanding.  Observing  a  correct  response  to  an  S  item,  for  example,  would  eliminate 
the  possibility  that  the  child  was  in  Stage  I.  But  because  the  model  is  not  perfect'1,  and 
because  children  make  slips  and  lucky  guesses,  any  response  could  be  observed  from  a 
child  in  any  stage.  A  latent  class  model  (Lazarsfeld,  1950)  can  be  used  to  express  the 


4  This  model  assumes  that  the  five  states  are  exhaustive  and  mutually  exclusive.  Alternative  models,  such 
as  those  of  Tatsuoka  and  Yamamoto  mentioned  earlier,  could  be  used  to  relax  these  restrictions. 
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structure  posited  in  Table  1  while  allowing  for  some  “noise”  in  real  data  (see  Appendix  for 
details).  Instead  of  expecting  incorrect  responses  with  probability  one  to  S  items  from 
Stage  I  children,  we  might  posits  some  small  fraction  of  correct  answers— p(S  correctl 
Stage=I).  Similar  probabilities  of  “false  positives”  can  be  estimated  for  other  cells  in  Table 
1  containing  0’s.  In  the  same  spirit,  probabilities  less  than  one,  due  to  “false  negatives,” 
can  be  estimated  for  the  cells  with  l’s.  Note  that  inferences  cannot  be  as  strong  when  these 
uncertainties  are  present;  a  correct  response  to  an  S  item  still  suggests  that  a  child  is 
probably  not  in  Stage  I,  but  no  longer  is  it  proof  positive. 

Expressing  this  model  in  the  notation  introduced  above,  T|  represents  stage 
membership,  x  represents  item  responses,  and  p(xlri)  are  conditional  probabilities  of 
correct  responses  to  items  of  the  various  types  from  children  in  different  stages — a  noisy 
version  of  Table  1.  The  proportions  of  children  in  a  population  of  interest  at  the  different 
stages  are  p(Tj),  and  the  probabilities  that  convey  our  knowledge  about  a  child’s  stage  after 
we  have  observed  his  responses  are  p(rjlx). 

Siegler  created  a  24-task  test  comprised  of  four  tasks  of  each  type.  He  collected 
data  from  60  children,  from  age  3  up  through  college  age,  at  two  points  in  time,  for  a  total 
of  120  response  vectors.  We  fit  a  latent  class  model  to  these  data  using  the  HYBRIL 
computer  program  (Yamamoto,  1987),  obtaining  the  conditional  probabilities — p(xlrj) — 
shown  in  Table  2,  and  the  following  vector  summarizing  the  (estimated)  population 
distribution  of  stage  membership: 

p(n)  =  (Prob(Stage=0),  Prob(Stage=I), ...,  Prob(Stage=IV)) 

=  (.257, .227, .163, .275, .078). 

[Table  2  about  here] 

Note  that  different  types  of  items  are  differentially  useful  to  distinguish  among 
children  at  different  levels.  E  items,  for  example,  are  best  for  distinguishing  Stage  0 
children  from  everyone  else.  CD  items,  which  would  be  dropped  from  standard  tests 
because  their  probabilities  of  correct  response  do  not  have  a  strictly  increasing  relationship 
with  total  scores,  help  differentiate  among  children  at  Stages  II,  III,  and  IV. 

Figure  5  depicts  the  state  of  knowledge  about  a  child  before  observing  any 
responses  using  the  conventions  of  the  MUNIN  figures.  Just  one  item  of  each  type  is 
shown  rather  than  all  four  for  simplicity.  The  corresponding  status  of  an  observable  node 
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(i.e.,  an  item  type)  is  the  expectation  of  a  correct  response  from  a  child  selected  at  random 
from  the  population.  The  path  from  the  stage-membership  node  to  a  particular  observable 
node  represents  a  row  of  Table  2. 


[Figure  5  about  here] 

Adaptive  testing 

Figure  5  represents  the  state  of  our  knowledge  about  a  child’s  reasoning  stage  and 
expected  responses  before  any  actual  responses  are  observed.  How  does  knowledge 
change  when  a  response  is  observed?  One  of  the  children  in  the  sample,  Douglas,  gave  an 
incorrect  response  to  his  first  S  item.  This  could  happen  regardless  of  Douglas’  true  stage; 
the  probabilities  are  obtained  by  subtracting  the  entries  in  the  S  row  of  Table  2  from  1.000, 
yielding,  for  Stages  0  through  IV,  .667,  .973,  .116,  .019,  and  .057  respectively.  This  is 
the  likelihood  function  fort]  induced  by  the  observation  of  the  response.  The  bulk  of  the 
evidence  is  for  Stages  0  and  I.  Combining  these  values  with  the  initial  stage  probabilities 
p(T|)  via  Bayes  theorem  yields  updated  stage  probabilities,  p(nlincorrect  response  to  an  S 
item):  for  Stages  0  through  IV  respectively,  .41,  .52,  .04,  .01,  and  .01.  Expectations  for 
items  not  yet  administered  also  change.  They  are  averages  of  the  probabilities  of  correct 
response  expected  from  the  various  stages,  now  weighted  by  the  new  stage  membership 
probabilities.  The  state  of  knowledge  after  observing  Douglas’  first  response  is  depicted  in 
Figure  6  (see  Appendix  for  details;  also  see  Macready  &  Dayton,  1989.) 

[Figure  6  about  here] 

In  a  simulation  of  adaptive  testing,  we  updated  our  knowledge  about  Douglas  one 
response  at  a  time,  at  each  step  looking  at  his  actual  response  to  an  item  expected  to  most 
substantially  reduce  our  uncertainty  about  his  stage  membership.  Figure  7  charts 
probabilities  of  stage  membership  for  Douglas  after  each  of  the  first  ten  items,  showing  that 
we  quickly  converge  to  Stage  0. 


[Figure  7  about  here] 


Extending  the  paradigm 

The  balance  beam  exemplar  illustrates  the  challenge  of  inferring  states  of 
understanding,  but  it  addresses  development  of  only  a  single  key  concept.  A  major  thrust 
of  our  proposal  is  to  characterize  interconnections  among  distinct  lines  of  development 
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This  section  takes  a  small  step  in  this  direction  by  discussing  a  hypothetical  extension  to  the 
exemplar,  namely,  the  ability  to  carry  out  the  arithmetic  operations  needed  to  calculate 
torques.  For  illustrative  purposes,  we  simply  posit  a  skill  to  carry  these  calculations  out 
reliably,  either  possessed  by  a  child  or  not.  Obviously  states  of  understanding  could  be 
developed  in  greater  detail  here. 

Calculating  and  comparing  torques  to  solve  the  “conflict”  problems  characterizes 
Stage  IV.  But  if  a  child  at  Stage  IV  cannot  carry  out  the  calculations  reliably,  his  pattern  of 
correct  and  incorrect  responses  would  be  hard  to  distinguish  from  that  of  a  child  in  Stage 
III.  Although  the  two  children  might  answer  about  the  same  number  of  items  correctly,  the 
instruction  appropriate  for  them  would  differ  dramatically.  And  children  at  any  stage  of 
understanding  of  the  balance  beam  might  be  able  to  carry  out  the  computational  operations 
in  isolation.  The  goal  of  the  extended  system  is  to  infer  both  balance-beam  understanding 
and  computational  skill.  To  make  the  distinctions  among  states  of  understanding  in  this 
extended  domain,  we  introduce  two  new  types  of  observations: 

1 .  Items  isolating  computation,  such  as  “Which  is  greater,  3x4  or  5x27* 

2 .  Probes  for  introspection  about  solutions  to  conflict  items:  “How  did  you  get  your 

answer?” 

Figure  8  offers  one  possible  structure  for  this  network.  Others  could  be  entertained, 
and  in  practice  one  would  compare  the  degree  to  which  they  accord  with  observed  data.  To 
keep  the  diagram  simple,  only  one  balance-beam  task  each  for  an  S  and  a  CS  task  are 
illustrated.  E  and  D  items  would  have  the  same  paths  as  the  S  task,  and  CD  and  CE  tasks 
would  have  the  same  paths  as  the  CS  tasks.  Also,  the  paths  from  Stage  0, 1,  and  II 
indicators  to  balance  beam  tasks  are  not  drawn  in.  The  structure  of  paths,  but  not 
necessarily  the  values,  would  be  the  same  as  those  connecting  the  Stage  III  indicator  to 
those  tasks. 


[Figure  8  about  here] 

There  are  three  kinds  of  unobservable  variables  in  the  system.  The  first  group 
expresses  level  of  understanding  in  the  balance  beam  domain.  It  proves  convenient  to 
express  stage  membership  in  terms  of  dichotomous  indicator  variables  for  each  stage, 
because  of  the  special  relationship  of  Stage  TV  to  computational  skill.  Second  is  the  ability 
to  carry  out  the  calculations  involved  in  computing  torques.  The  third  concerns  the 
integration  of  balance-beam  understanding  and  calculating  proficiency.  Specifically,  we 
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posit  an  indicator  for  whether  a  child  both  is  in  Stage  IV  and  possesses  the  requisite 
computational  skills.  Other  features  of  the  network  worth  mentioning  are  as  follows. 

1 .  The  probabilities  of  the  pure  computation  items  depend  on  the  unobservable 
computation  variable  only;  they  are  conditionally  independent  of  level  of  balance 
beam  understanding. 

2.  The  correctness  aspect  of  an  answer  has  only  two  possibilities,  right  or  wrong,  but 
an  explanation  can  fall  into  five  categories  corresponding  to  levels  of 
understanding.  A  Stage  in  child  might  give  an  explanation  consistent  with  Stages 
0, 1,  II,  or  in,  but  would  not  give  a  Stage  IV  explanation.  Theory  thus  posits  that 
the  conditional  probability  of  a  Stage  K  response  from  a  Stage  J  child  is  zero  if 
K>J.  Conditional  probabilities  for  K<J  might  be  estimated  from  data  or  based  on 
experts’  experience.  It  may  turn  out,  for  example,  that  the  most  likely  explanation 
for  an  E  task  from  people  at  Stage  IV  would  probably  be  a  Stage  II  explanation;  “It 
balances  because  both  the  weights  and  distances  are  equal.” 

3 .  For  children  in  Stages  0  through  HI,  both  the  right/wrong  answers  and  the  “How” 
answers  to  balance  beam  tasks  depend  only  on  level  of  understanding.  Because 
they  do  not  realize  the  connection  between  the  problems  and  the  torque  calculations, 
their  responses  to  the  balance  beam  tasks  are  conditionally  independent  of  their 
computational  skill,  even  on  items  for  which  that  skill  is  an  integral  component  of 
an  expert  solution. 

4.  For  children  in  Stage  IV,  right/wrong  answers  to  conflict  items  depend  on  the 
understanding/computation  integration  variable,  but  “How”  answers  depend  only 
on  understanding.  A  child  in  Stage  IV  with  low  computational  skill  can  thus  be 
differentiated  from  a  child  in  Stage  HI  by  his  higher  probabilities  of  giving  Stage  IV 
explanations  and  incorrect  answers  to  pure  computation  problems. 

Discussion 

This  conceptual  framework  described  above  holds  the  promise  of  extending  and 
clarifying  standard  educational  measurement  practices  in  several  ways; 

Connections  with  instruction  can  be  forged  more  easily  than  with  standard  tests, 
because  the  focus  is  no  longer  on  how  many  questions  a  student  can  answer,  but  how  they 
answer  them.  In  medical  diagnosis,  different  diseases  gave  rise  to  similar  results  in  certain 
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tests;  in  education,  so  too  can  different  approaches  lead  to  similar  test  scores  for  students. 
But  accounting  for  the  patterns  of  performance,  especially  if  probing  adaptively,  can 
pinpoint  the  areas  which  need  attention  to  best  improve  performance. 

Student  reports  can  be  provided  at  varying  levels  and  highlighting  different  features 
of  a  student’s  status.  Of  particular  importance  to  the  student  and  the  teacher  are  reports  in 
terms  of  levels  or  stages  of  understanding  of  key  concepts,  since  this  is  the  level  at  which 
instruction  is  aimed.  For  the  quality  control  purposes  of  administrators,  however,  one 
could  predict  a  student’s  performance  on  a  standard  set  of  tasks  in  the  domain — say,  a 
“market  basket”  of  tasks  that,  ideally,  every  student  should  eventually  be  able  to  handle. 

Use  of  different  strategies  or  mental  models  can  be  accommodated  in  an  inference 
network.  This  can  take  the  form  of  either  a  single  strategy/mental  model  choice  for  all  tasks 
in  a  class,  as  studied  by  Mislevy  and  Verhelst  (1990),  or  strategy/model  switching  from 
one  task  to  another  (as  in  Snow  &  Lohman,  1984).  The  nature  and  the  strength  of 
inferences  one  can  draw  will  depend  on  the  potential  observational  settings.  With  rich 
information,  such  as  verbal  protocols  or  partial  solutions,  it  may  be  possible  to  characterize 
the  range  of  solution  methods  the  student  has  available  and  the  conditions  under  which  he 
employs  them. 

Testing  " higher-order  thinking"  can  be  accomplished  by  including  unobservable 
nodes  for  connections  among  more  basic  facts  or  concepts,  and  observable  nodes  that 
correspond  to  tasks  for  which  the  relationships  of  interest  are  critical.  Because  such  tasks 
might  well  be  open-ended  and  approachable  in  a  variety  of  ways,  the  possibility  of 
alternative  solution  strategies  would  need  to  be  built  into  the  network. 

Adaptive  testing  can  be  carried  out  among  concepts,  not  just  for  a  single  concept. 
IRT  applications  of  adaptive  testing  are  based  on  the  one-to-many  relationships  that  are 
appropriate  for  determining  overall  levels  of  proficiency,  but  inadequate  for  understanding 
connections  among  concepts.  The  inference  network  facilitates  stepping  variously 
throughout  a  domain,  gathering  information  about  critical  domains  by  presenting  tasks  that 
call  for  varying  combinations  of  key  skills. 

Handling  atypical  knowledge  configurations  or  observational  patterns  can  be 
accomplished  by  incorporating  nodes  analogous  to  the  “Other”  disease  state  in  MUNIN  or 
the  catch-all  IRT  class  in  Yamamoto’s  (1987)  Hybrid  model.  An  “Other”  state  of 
understanding  is  a  mechanism  for  capturing  observational  patterns  that  do  not  accord  with 
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those  specifically  built  into  the  network.  A  situation-sensitive  student  report  might  be 
generated  in  an  instructional  system  when  such  a  node  becomes  prominent,  signalling  that 
more  intelligence  than  is  embodied  in  the  system  is  needed  to  figure  out  what  this  student  is 
doing,  and  decide  what  to  do  about  it. 


Conclusion 

Learning  can  be  enhanced  by  a  unified  conceptual  framework  for  instruction, 
testing,  and  reporting,  because  only  in  such  a  framework  can  coherent  feedback  loops  be 
constructed.  This  presentation  has  focused  on  the  educational  measurement  aspect  of  a 
system  built  on  this  premise.  The  recent  introduction  of  measurement  models  built  around 
states  of  understanding,  and  of  inferential  techniques  to  connect  such  pieces  into  networks 
that  describe  domains  of  school  learning,  provide  a  foundation  for  improved  educational 
practice  in  this  manner. 
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Appendix 

Equations  for  the  Latent  Class  Model 


The  Model 

Let  Tj  =  (nov.TU)  denote  the  stage  of  understanding  of  a  child,  with  T)k=l  if  he  or 
she  is  in  Stage  k  and  0  if  not.  Let  n  =  (7to,...,7t4)  denote  the  population  proportions  of 
children  in  these  classes;  that  is,  %  =  p(T|k=l).  Let  Xj  represent  a  response  to  Task  j,  1  if 
correct  and  0  if  not;  j  runs  from  1  to  24.  The  conditional  probabilities  of  correct  response 
are  Prob(xj=llriic=l),  or  Pjk  for  short.  P  denotes  the  matrix  ((Pjk))-  A  vector  of  item 
responses,  x  =  (xi,...,X24)  is  assumed  to  have  the  following  probability  conditional  on 
Stage  membership: 

P(xlllk=l)  =  npjkx,(1-I,jk)1'xi- 

i  (1) 

Similar  expressions  are  assumed  to  hold  for  subsets  of  responses  as  well,  regardless  of  the 
order  in  which  they  are  observed. 

The  marginal  probability  of  a  response  vector  is  an  average  of  terms  like  (1), 
weighted  by  the  population  probabilities  of  stage  membership: 

4 

p(x)  P(xlTlk=l)  Ttk. 

k=0  (2) 

Let  X  denote  the  matrix  of  response  vectors  of  a  sample  of  N  respondents.  For  a  generic 
pattern  x/  ,  let  n^  be  the  number  of  respondents  producing  this  pattern.  The  probability  of 

X  as  a  function  of  P  and  k  has  the  form 

P(XIP,7t)  =  C  n  P(x^ , 

*  (3) 

where  C  does  not  depend  on  P  or  7t.  Once  X  has  been  observed,  (3)  can  be  interpreted  as 
a  likelihood  function,  and  maxima  may  be  found  with  respect  to  P  and  it. 

Because  N  is  only  120  in  the  balance  beam  example,  a  number  of  constraints  were 
introduced  so  that  stable  estimates  would  be  obtained.  Many  could  be  relaxed  or  removed 
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with  larger  samples.  The  results  reported  in  Table  2  represent  the  best-fitting  result  among 
several  models  with  similar  numbers  of  constraints.  The  PjkS  that  appear  as  .333  in  Table 
1  were  fixed  at  that  value.  All  four  items  of  a  given  type  were  constrained  to  have  the  same 
PjkS.  For  a  given  column,  all  PjkS  in  cells  that  correspond  to  l’s  in  Table  1  were 
constrained  to  be  equal  to  a  single  estimated  value.  Any  cells  in  that  column  that 
correspond  to  0’s  were  constrained  to  its  complement. 

Adaptive  Testing 

The  maximum  likelihood  estimates  of  P  and  k  were  treated  as  known  true 
parameter  values  during  simulated  adaptive  testing.  The  uncertainty  in  these  values  could 
be  taken  into  account,  but  we  have  avoided  the  complication  for  this  demonstration. 

Before  observing  any  responses  from  a  given  child,  the  expected  value  of  his  Tj  is 
the  population  value  7t.  The  expected  value  of  a  response  to  a  particular  item  j  is  obtained 
analogously  to  (2),  simplified  to  a  single,  as  yet  unobserved,  response: 

p(xj=l)  =  X  P(Xj=llTlk=l)  P(Bk=l) 
k 

=  X  pjk  POlk-l)  . 

k  (4) 

Suppose  that  Item  g  is  administered  to  a  particular  examinee,  and  the  value  of  xg, 
either  0  or  1,  becomes  known.  How  is  this  information  propagated  through  the  network? 
First,  using  Bayes  theorem,  we  update  probabilities  for  his  rj.  For  k=0,...,4, 

p(T1k=llXg)  ,  P(T]k=i)  t 

X  P(xglBh=l)  P(TJh=l) 

h  (5) 

This  gives  new  probabilities  that  the  examinee  is  in  each  of  the  pos  sible  stages.  These  are 
in  turn  reflected  in  new  expectations  for  items  not  yet  administered  by  replacing  p(T|k=l)  in 
(4)  with  p(Tlk=Hxg)  to  obtain 

p(xj=llxg)  =  ]T  p(xj=llTik=l)p(T|k=llxg)  . 

k  (6) 

This  process  can  be  repeated  with  additional  items  presented  one  at  a  time.  Let  xs 
represent  a  partial  response  sequence;  Item  s+1  is  next  administered  to  form  xs+i.  Then 
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5 


P(nk=ii*s+I)  =  ■  p^i'^Dp^iIxs) 

X  P(xs+l=llTlh=l)p(Tlh=llxs) 
h  (7) 


and,  for  items  not  yet  presented, 


p(xj=llxs+1)  =  Y,  p(xj=ll%=l)  p(Tlk=llxs+i)  . 

k  (8) 

Selecting  which  item  to  present  next  and  deciding  when  to  stop  depends  on 
probabilities  for  tj.  In  this  paper  we  have  addressed  only  the  case  in  which  no  decision¬ 
making  cost  structure  is  available,  and  we  address  only  the  goal  of  minimizing  uncertainty 
about  T].  This  can  be  accomplished  by  minimum  entropy  adaptive  testing.  Entropy  is  a 
measure  of  randomness.  For  the  five-class  balance  beam  problem,  the  maximal  value  of 
entropy  occurs  when  probabilities  of  all  five  classes  are  equal,  and  the  minimal  value 
occurs  when  the  probability  of  one  particular  stage  is  one.  The  general  formula  for  entropy 
after  having  observed  xs  is 

E(xs)  =  -Y  P0lk=llxs)  log[p(T|k=Hxs)]  . 

k  (9) 

After  having  observed  xs,  one  can  evaluate  the  expected  entropy  associated  with  the 
administration  of  any  remaining  item  j  as 

E[xsn(xj=0)]  p(xj=0lxs)  +  E[xsn(xj=l)]  p(xj=llxs)  (1m 


The  item  that  minimizes  (10)  is  presented  next. 

It  bears  repeating  that  these  formulae  assume  both  that  the  model  is  correct  and  the 
conditional  probabilities  are  known  with  certainty.  Violations  of  these  assumptions 
generally  degrade  knowledge  about  an  examinee’s  state,  making  (5)  and  (8)  in  particular 
overly  optimistic.  Work  remains  to  be  done,  in  studying  the  robustness  of  the  approach  to 
violations  of  the  assumptions,  learning  how  to  minimize  violations  in  practice,  and 
modifying  the  model  or  the  conditional  probabilities  to  mitigate  inferential  errors  in  the 
presence  of  violations. 
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TABLE  1 


Theoretical  Conditional  Probabilities — 
Expected  Proportions  of  correct  Response 


Problem  type 

Stage  0 

Stage  I 

Stage  II 

Stage  III 

Stage  IV 

E 

.333 

1.000 

1.000 

1.000 

1.000 

D 

.333 

1.000 

1.000 

1.000 

1.000 

S 

.333 

.000 

1.000 

1.000 

1.000 

CD 

.333 

1.000 

1.000 

.333 

1.000 

CS 

.333 

.000 

.000 

.333 

1.000 

CE 

.333 

.000 

.000 

.333 

1.000 

TABLE 2 

Estimated  Conditional  Probabilities — 

Expected  Proportions  of  correct  Response 

Problem  type 

Stage  0 

Stage  I 

Stage  II 

Stage  III 

Stage  IV 

E 

.333* 

.973 

.883 

.981 

.943 

D 

.333* 

.973 

.883 

.981 

.943 

S 

.333* 

.026 

.883 

.981 

.943 

CD 

.333* 

.973 

.883 

.333* 

.943 

CS 

.333* 

.026 

.116 

.333* 

.943 

CE 

.333* 

.026 

.116 

.333* 

.943 

*  denotes  fixed  value 
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FIGURE  1 

The  MUNIN  Network:  Initial  Status 


(From  Andreassen  et  al.,  1987) 
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FIGURE  2 

The  MUNIN  Network:  After  Selected  Observations 


(From  Andreassen  et  al.,  1987) 


When  the  blocks  are  removed,  will  the 
beam  tip  left,  tip  right,  or  stay  flat? 


Figure  3 

A  Sample  Balance-Beam  Task 


Item  Type  Sample  Item  Description 


E 


Equal  problems  (E),  with 
matching  weights  and  lengths  on 
both  sides. 


D 


S 


CD 


CS 


Dominant  problems  (D),  with 
unequal  weights  but  equal 
lengths. 


Subordinate  problems  (S),  with 
unequal  lengths  but  equal 
weights. 


ILL 

4 


Conflict-dominant  problems  (CD), 
in  which  one  side  has  greater  weight, 
the  other  has  greater  length,  and  the 
side  with  the  heavier  weight  will  go 
down. 


iLLLiii 


Conflict-subordinate  problems 
(CS),  in  which  one  side  has  greater 
weight,  the  other  has  greater  length, 
and  the  side  with  the  greater  length 
will  go  down. 


CE 


Conflict-equal  problems  (CE),  in 
which  one  side  has  greater  weight, 
the  other  has  greater  length,  and  the 
beam  will  balance. 


Figure  4 

Sample  Balance  Beam  Items 
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Posterior  Probabilities  of  Cognitive  Levels 


Problem 

Type 

S-item- 

Correctness 


CS-item-- 

Explanation 


CS-item-- 

Correctness 


Computation 

Computation 


Figure  8 

Representation  of  an  Extended  Balance-Beam  Network 


Distribution  Liu 


Dr.  Tory  Ackerman 
Educational  Psychology 
210  Education  Bldg 
Univenity  of  lllinoia 
Champaign,  IL  01801 

Dr.  James  AJgina 
1*0  Norman  Hall 
University  of  Florida 
Gainesville.  FL  32005 

Dr.  Eriing  B.  Andersen 
Department  of  Statiatica 
Suidiestraede  6 
1455  Copenhagen 
DENMARK 

Dr.  Ronald  Armstrong 
Rutgers  University 
Graduate  School  of  Management 
Newati,  NJ  07102 

Dr.  Eva  L  Baker 
UCLA  Center  for  the  Study 
of  Evaluation 
145  Moore  Hall 
University  of  California 
Los  Angeles.  CA  90024 

Dr.  Laura  L  Barnes 
College  of  Education 
Univenity  of  Toledo 
2801  W.  Bancroft  Street 
Toledo.  011  43606 

Dr.  William  M.  Bart 
Univenity  of  Minnesota 
Dept,  of  Educ.  Psychology 
330  Burton  Hall 
178  Pillsbury  Dr,  &E 
Minneapolis,  MN  55455 

Dr.  Isaac  Bejar 
Mail  Stop:  10-R 
Educational  Testing  Service 
Rosedale  Road 
Princeton.  NJ  06541 

Dr.  Mcnucha  Birenbaum 
School  of  Education 
Tel  Aviv  University 
Ramal  Aw  69978 
ISRAEL 

Dr.  Arthur  S.  Blaises 
Code  N7I2 

Naval  Tnining  Systems  Center 
Orlando.  FL  32813-7100 

Dr.  Bruce  Bloaom 
Defense  Msnponer  Data  Center 
99  Pacific  Sl 
Suite  155A 

Monterey.  CA  93943-3231 

CdL  Arnold  Bohrer 

Sec  tic  Psychologisch  Onderroel 

Rekrutcrings-En  Selectiecentrum 

Kwattier  Koningen  Astrid 

Bruijnstraat 

1120  Brussels,  BELGIUM 

Dr.  Robert  Breaux 
Code  281 

Naval  Tnining  Systems  Center 
Orlando,  FL  32826-3224 

Dr.  Robert  Brennan 
American  College  Testing 
Programs 
P.  O.  Box  168 
loss  City.  1A  52243 

Dr.  Gregory  Candeii 
CTB/McGraw-Hill 
2500  Garden  Road 
Monterey.  CA  93940 


Dr.  John  B.  Carroll 
409  Elliott  RdL,  North 
Chapel  Hill,  NC  27514 

Dr.  John  M.  Carroll 
IBM  Watson  Research  Center 
User  Interface  Institute 
P.O.  Box  704 

Yoettcwo  Heights,  NY  10598 

Dr.  Robert  M.  Carroll 
Chief  of  Naval  Operations 
OP-01B2 

Washington.  DC  20350 

Dr.  Raymond  E  Chriatal 
UES  LAMP  Science  Advisor 
AFHRL/MOEL 
Brooks  AFB,  TX  78235 

Mr.  Hus  Hus  Chung 
University  of  Illinois 
Department  of  Statistics 
101  lllini  Hall 
725  South  Wright  St. 

Champaign,  IL  61820 

Dr,  Norman  ClifT 
Department  of  Piycbology 
Univ.  of  So.  California 
Los  Angelas,  CA  900091061 

Director,  Man  poser  Program 
Center  for  Naval  Analyses 
4401  Ford  Avenue 
P.O.  Box  16268 
Alexandria.  VA  2230241268 

Director, 

Manpower  Support  and 
Rcadmcaa  Program 
Center  for  Naval  Analysis 
2000  North  Beauregard  Sheet 
Alexandria,  VA  22311 

Dr.  Stanley  Col  Iyer 
Office  of  Naval  Technology 
Code  222 

800  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Dr.  Hina  F.  Cromhag 
Faculty  of  Law 
University  of  Limburg 
P.O.  Box  616 
Maastricht 

The  NETHERLANDS  6200  MD 

Mi.  Carotyn  R.  Crone 
Johns  Hopkins  Unhersiiy 
Department  of  Piycboioiy 
Charles  A  34th  Street 
Baltimore,  MD  21218 

Dr.  Timothy  Devey 
American  College  Testing  Program 
P.O.  Box  168 
Idea  City,  LA  52243 

Dr.  C  M.  Dayton 
Department  of  Measurement 
Statiatica  A  Evaluation 
College  of  Education 
Uoruetsity  of  Maryland 
College  Park,  MD  20742 

Dr.  Ralph  J.  DeAyaia 
Measurement,  Suiaatacx, 
and  Evaluation 
Benjamin  Bldg,  Rm.  4112 
Univenity  of  Maryland 
College  Part,  MD  20742 


Dr.  Lou  DiBetk) 

CERL 

Univenity  of  Illinois 
100  South  Mathews  Avenue 
Urbans,  IL  61801 

Dr.  Dattpraaad  Divgj 
Center  for  Naval  Analytic 
4401  Ford  Avenue 
P.O.  Box  16268 
Aleandcie,  VA  2230241268 

Mr.  Het-B  Doog 

Bet  Communications  Research 

Room  PYA  IK207 

P.O.  Box  1320 

Pacataw*.  NJ  0*855-1320 

Dr.  Fritz  Drasgcw 
Univenity  of  Uiooia 
Department  of  Psychology 
603  E  Denid  St. 

Champaign.  IL  61820 

Dr.  Stephen  Dunber 
224B  Lindquist  Center 
for  Measurement 
Univenity  of  Iowa 
Iowa  City,  1A  52242 

Dr.  Jamas  A  Eariaa 

Air  Force  Human  Resources  Lab 

Brooks  AFB.  TX  78235 

Dr.  Susan  Embrttson 
University  of  Kama* 

Psychology  Department 
426  Fraser 
Lawrence,  KS  66045 

Dr.  George  Engkhard.  Jr. 
Division  of  Educational  Studies 
Einoey  University 
210  Fahbornc  Bldg 
Atlanta,  GA  30322 

Dr.  Benjamin  A  Fairbank 
Operational  Technologies  Corp. 
5825  Callaghan.  Suite  225 
San  Antonio,  TX  78228 

Dr.  P-A  Federico 
Code  51 
NPRDC 

San  Diego.  CA  92152-6800 

Dr.  Leonard  Fddt 
Lindquist  Center 
for  Measurement 
Univenity  of  Iowa 
Iowa  City.  1A  52242 

Dr.  Richard  E  Ferguson 
American  College  Testing 
P.O.  Box  168 
Iowa  City.  !A  52243 

Dr.  Gerhard  Father 
Liebiggaaae  5/3 
A  1010  Vienna 
AUSTRIA 

Dr.  Myron  FachJ 
US  Army  Headquarters 
DAPE-MRR 
The  Pentagon 

Washington,  DC  203104000 

Prof.  Donald  Flugerald 
Univenity  of  New  England 
Department  of  Piycfadogr 
Arraidaie,  New  South  Wales  2351 
AUSTRALIA 

Mr.  Paul  Foley 

Navy  Personnel  RAD  Center 

San  Diego.  CA  92152-6800 


Educational  Tearing  Service/Mislevy 


KYKV90 


Dr.  Alfred  R.  Fregly 
AFOSR/NL,  Bldg  410 
Bolling  AFB,  DC  20332-6448 

Dr,  Robert  D.  Gibbon* 

Illinois  State  Paycbiatric  I  rut. 

Ra  529W 

1601  W.  Taylor  Street 
Chicago,  IL  60612 

Dr.  Janice  Gifford 
Univeraity  of  Maaaachuaetu 
School  of  Education 
Amherst,  MA  01003 

Dr.  Drew  Gitomcr 
Educational  Teating  Service 
Princeton,  NJ  08541 

Dr.  Robert  G  later 
Learning  Raacarch 
A  Development  Center 
Univartity  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh.  PA  15260 

Dr.  Ben  Green 
John*  Hopbine  Univeraity 
Department  of  Psychology 
Charie*  A  34tb  Street 
Baltimore,  MD  21218 

Michael  Habon 
DORNIER  GMBH 
P.O.  Box  1420 
D-7990  Frkdrichstufcn  1 
WEST  GERMANY 

Prof.  Edward  Heertel 
School  of  Education 
Stanford  Univeraity 
Stanford,  CA  94305 

Dr,  Ronald  K.  Hambleton 
Univeraity  of  Mattachuaetta 
Laboratory  of  Psychometric 
and  Evaluative  Raacarch 
HiUt  South,  Room  152 
Am  beret,  MA  01003 

Dr,  Dekvyn  Hamiach 
Univeraity  of  lllinoia 
51  Getty  Drive 
Champaign,  IL  61820 

Dr.  Grant  Henning 
Senior  Research  Sciential 
Division  of  Measurement 
Research  and  Services 
Educational  Teating  Service 
Princeton,  NJ  08541 

Ms.  Rebecca  Hetter 
Navy  Personnel  RAD  Center 
Code  63 

San  Diego,  CA  92152-6800 

Dr.  Thomas  M.  Hirsch 
ACT 

P.  O.  Box  168 
Iowa  City.  1A  52243 

Dr.  Paul  W.  Holland 
Educational  Tearing  Service,  21-T 
Roaedale  Road 
Princeton,  NJ  08541 

Dr.  Paul  Hoot 
677  G  Street,  #184 
Chula  VisU,  CA  92010 

Dr.  Lloyd  Humphreys 
University  of  Illinois 
Department  of  Psychology 
603  East  Daniel  Street 
Champaign,  IL  61820 


Dr.  Steven  Hunks 
3-104  Educ.  N. 

University  of  Alberta 
Edmonton,  Alberta 
CANADA  T6G2G5 

Dr.  Huynh  Huynh 
College  of  Education 
Urriv.  of  South  Carolina 
Columbia,  SC  29206 

Dr.  Robert  Jannarooe 
Eke.  and  Computer  Eng  Dept. 
University  of  South  Carolina 
Columbia,  SC  29208 

Dr.  Kumar  3oag-dev 
University  of  Illinois 
Department  of  Statistics 
101  Mini  Hall 
725  South  Wright  Street 
Champaign,  IL  61820 

Dr.  Douglas  K  Jones 
1280  Woodftm  Court 
Toms  River,  NJ  08753 

Dr,  Brian  Junker 
Camegie-Mellon  University 
Department  of  Statistic* 

Schenley  Park 
Pituburgh,  PA  15213 

Dr.  Milton  S.  Katx 
European  Science  Coordination 
Offioe 

US.  Army  Raacarch  Institute 
Box  65 

FPO  New  York  09510-1500 

Prof.  John  A.  Keau 
Department  of  Psychology 
University  of  Newcastle 
NSW,  2308 
AUSTRALIA 

Dr.  Jwa-ksun  Kim 
Department  of  Psychology 
Middle  Tennessee  Slate 
University 
P.O.  Box  $22 
Murfreesboro,  IN  37132 

Mr.  Soon-Hoon  Kim 
Compuicr-bteed  Education 
Research  Laboratory 
University  of  Illinois 
Urbane,  IL  61801 

Dr.  G.  Gage  Kingsbury 
Portland  Public  Schools 
Research  and  Evaluation  Department 
501  North  Dixon  Street 
P.  O.  Box  3107 
Portland,  OR  97209-3107 

Dr.  William  Koch 
Box  7246.  Meat,  and  Evil  Ctr. 
University  of  Tease- Austin 
Austin.  TX  78703 

Dr.  Richard  J.  Koubek 
Department  of  Biomedical 
A  Human  Factors 
139  Engineering  A  Math  Bldg 
Wright  Stile  University 
Dayton,  OH  45435 

Dr.  Leonard  Kroeker 
Navy  Personnel  RAD  Center 
Code  62 

San  Ditjo,  CA  92152-6800 


Dr.  Jetty  Lehnus 

Defense  Manpower  Data  Center 

Suite  400 

1600  Wtkon  Blvd 

Roaalyn,  VA  22209 

Dr.  Thomas  Leonard 
Univeoity  of  Wiaconain 
Department  of  Statistics 
1210  West  Dayton  Street 
Madison,  W1  53705 

Dr.  Michael  Levine 
Educational  Psychology 
210  Education  Bldg 
Unwersity  of  Illinois 
Champaign,  IL  61801 

Dr.  Charles  Lewis 
Educational  Tearing  Service 
Princeton,  NJ  08541-0001 

Mr.  Rodney  Lim 
University  of  UUrsoes 
Department  of  Psychology 
603  R  Daniel  St. 

Champaign.  IL  61820 

Dr.  Robert  L  Linn 
Campus  Box  249 
Univanity  of  Colorado 
Boulder.  CO  8030941249 

Dr.  Robert  Lockman 
Canter  for  Naval  Analysis 
4401  Ford  Avenue 
P.O.  Box  16268 
Alexandria,  VA  223024068 

Dr.  Frederic  M.  Lord 
Educational  Tearing  Service 
Princeton,  NJ  00541 

Dr.  Richard  Luecht 
ACT 

P.  O.  Box  168 
Iowa  Cuy,  1A  $2243 

Dr.  George  B.  Macrtady 
Department  of  Measurement 
Surisrice  A  Evaluation 
College  of  Education 
University  of  Maryland 
College  Park,  MD  20742 

Dr.  Gary  Marco 
Stop  31-E 

Educational  Tearing  Service 
Princeton,  NJ  08451 

Dr.  Ckaaen  J.  Martin 
Office  of  Chief  of  Naval 
Operations  (OP  13  F) 

Navy  Annex,  Room  2832 
Washington,  DC  20350 

Dr.  James  R.  McBride 
The  Psychological  Corporation 
1250  Sish  Avenue 
San  Diego,  CA  92101 

Dr.  Clarence  C.  McCormick 
HO,  USMEPCOM/MEPCT 
2500  Green  Bay  Roed 
North  Chicago,  IL  60064 

Mr.  Christopher  McCuaker 
University  of  lllinoia 
Department  of  Psychology 
603  R  Daniel  St. 

Champaign,  IL  61820 

Dr.  Robert  McKinley 
Educational  Tearing  Service 
Princeton,  NJ  06541 


Educational  Testing  Servke/Misle\y 


1<Y1<V90 


Mr.  Abn  Mead 
c/o  Dr.  Micbad  Levine 
Educational  Psychology 
210  Education  Bldg. 

Unrveraity  of  lltinoia 
Champaign,  1L  61801 

Dr.  Timothy  Miller 

Ion  Gty,  1A  52243 

Dr.  Robot  Miskvy 
Educational  Toting  Service 
Princeton.  NJ  06541 

Dr.  William  Montague 
NPRDC  Code  13 
San  Diego.  CA  92152-6800 

Ms.  Kathleen  Moreno 
Navy  Personnel  RAD  Center 
Code  62 

San  Diego,  CA  92152-6800 

Headquarter,  Marine  Corps 
Code  MPI-20 
Washington,  DC  20380 

Dr.  Ratna  Nandakumar 
Educational  Studies 
Willard  Hall,  Room  213E 
Univeraity  of  Delaware 
Nenrit.DE  19716 

Dr.  Harold  F.  O’Neil,  Jr. 

School  of  Education  •  WPH  801 
Department  of  Educational 
Psychology  A  Technology 
Univeraity  of  Southern  California 
Loa  Angela,  CA  900694031 

Dr.  Jaroea  B,  OUen 
W1CAT  Syatcma 
1875  South  State  Street 
Orem.  ITT  84058 

Dr.  Judith  Oraaanu 
Baaic  Reacarcb  Office 
Army  Reaearcfa  Inititule 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Jeaae  Ortansky 
Inatitute  for  Defcnae  Analyaea 
1801  N.  Beauregard  St 
Alexandria,  VA  22311 

Dr.  Peter  J.  Paahley 
Educational  Tearing  Service 
Roaedale  Road 
Princeton,  NJ  06541 

Wayne  M.  Pauence 
American  Council  on  Educarion 
GED  Tearing  Service,  Suite  20 
One  Dupont  Circle,  NW 
Washington.  DC  20036 

Dr.  Jamea  PauUon 
Department  of  Psychology 
Portland  Slate  Univeraity 
P.O.  Box  751 
Portland,  OR  97207 

Dr.  Mart  D.  Reckaae 

£3e.W 

Iowa  Gty,  IA  52243 

Dr.  Malcolm  Ree 
AFHRUMOA 
Brooka  AFB.  TO  78235 


Mr.  Steve  Reiia 
N660  Elliott  Hall 
Univeraity  of  Minneaou 
75  E  River  Road 
Minneapdia,  MN  554554344 

Dr.  Carl  Rees 
CNET-PDCD 
Building  90 

Great  Latex  NIC,  1L  60068 
Dr.  3.  Ryan 

Department  of  Educarion 
Univeraity  of  South  Carolina 
Columbia,  SC  29206 

Dr.  Fumiko  Samejima 
Department  of  Paycbolo£r 
Univeraity  of  Tenneaaee 
310B  Auarin  Pety  Bldg. 
Knoxville,  TN  379164900 

Mr.  Drew  Sand, 

NPRDC  Code  62 
San  Diego,  CA  921524800 

Lowell  Scfaoer 

Paychdogical  A  Quantitative 
Foundations 
College  of  Educarion 
Univeraity  of  Iowa 
Iowa  Gty,  IA  52242 

Dr.  Maty  Schratx 
905  Orchid  Way 
Carlsbad,  CA  92009 

Dr.  Dan  Segall 

Navy  Peraonnel  RAD  Center 

San  Diego.  CA  92152 

Dr.  Robin  Sbealy 
Univeraity  of  lllinoia 
Department  of  Surislica 
101  lllini  Hall 
725  South  Wright  St. 
Champaign,  IL  61620 

Dr.  Kazoo  Shigemaau 
7-9-24  Kugenuma-Katgan 
Fujisawa  251 
JAPAN 

Dr.  Richard  E  Snow 
School  of  Education 
Stanford  Univeraity 
Sanford,  CA  94305 

Dr.  Richard  C  Socenaen 
Navy  Peraonnel  RAD  Center 
San  Diego,  CA  921524800 

Dr.  Judy  Spray 
ACT 

P.O.  Box  168 
Iowa  Gty,  IA  52243 

Dr,  Martha  Stocking 
Educational  Tearing  Service 
Princeton,  NJ  06541 

Dr.  Peter  Stoloff 
Center  for  Naval  Analysis 
4401  Ford  Avenue 
P.O.  Box  16268 
Alexandria.  VA  223024268 

Dr.  William  Stout 
Univeraity  of  lllinoil 
Department  of  Stariarica 
101  mini  Hall 
725  South  Wright  St. 
Champaign,  IL  61820 


Dr.  Haribaran  Swiminatban 
Laboratory  of  Faycbooetric  and 
Evaluation  Research 
School  of  Education 
Univeraity  of  Massachusetts 
Amherst,  MA  01003 

Mr.  Brad  Sympaon 

Navy  Petsoood  RAD  Center 

Code-62 

San  Diego,  CA  921524800 

Dr.  John  Tangtxy 
AFOSR/NL,  Bldg.  410 
Bolling  AFB,  DC  20J324448 

Dr.  Kikumi  Tataooka 
Educational  Tearing  Service 
Mail  Stop  03-T 
Princeton,  NJ  06541 

Dr.  Maurice  Tatsuoka 
Educational  Tearing  Service 
Mail  Stop  03-T 
Princeton,  NJ  06541 

Dr.  David  Thiaaen 
Department  of  Psychology 
Univeraity  of  Kansas 
Lawrence,  KS  66044 

Mr.  Thomas  3.  Thomas 
Johns  Hopkins  University 
Department  of  Psychology 
Charles  A  34th  Street 
Baltimore,  MD  21218 

Mr.  Gaty  Tboenaaaon 
Univeraity  of  lllinoia 
Educational  Psychology 
Champaign,  IL  61820 

Dr.  Robert  Tsutakawa 
Univeraity  of  Missouri 
Department  of  Statistics 
222  Math.  Sciences  Bldg 
Columbia.  MO  65211 

Dr.  Ledyird  Tucker 
Univeraity  of  Illinois 
Department  of  Psychology 
603  E  Denid  Street 
Champaign,  IL  61820 

Dr.  Devid  Vale 
Assessment  Systems  Corp. 

2233  Univeraity  Avenue 
Suite  440 

Si  Paul  MN  55114 

Dr.  Frank  L.  Vidno 
Navy  Personnel  RAD  Center 
San  Diego.  CA  921524800 

Dr.  Howard  Wainer 
Educational  Tearing  Service 
Princeton.  NJ  06541 

Dr.  Michael  T.  Waller 
Univeraity  of  Wwconain-Milwaukee 
Educational  Psychology  Department 
Box  413 

Milwaukee,  WI  53201 

Dr.  Ming-Md  Wang 
Educational  Tearing  Service 
Mail  Stop  03-T 
Princeton,  N3  06541 

Dr.  Thomas  A.  Warm 
FAA  Academy  AAC934D 
P.O.  Box  25062 
Oklahoma  Gty,  OK  73125 


Educational  Tearing  Service/Mitlevy 
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Dr.  Brian  Water* 
HumRRO 
1100  &  Washington 
Alexandria,  VA  22314 


Dr.  David  3.  Weiu 
N«0  Elliott  Hall 
Uniusreity  of  Minnesota 
75  E.  River  Road 
Minneapolis.  MN  554554344 

Dr.  Ronald  A.  Weitzman 
Box  146 

Camel,  CA  93921 

Major  John  Webb 
AFHRUMOAN 
Brooks  AFB,  TX  7*223 

Dr.  Douglas  Wetzel 
Code  51 

Navy  Personnel  RAD  Center 
San  Diego,  CA  92152-MOO 

Dr.  Rand  R.  Wilcox 
University  of  Southern 
California 

Department  of  Psychology 
Los  Angeles,  CA  90089-1061 

German  Military  Representative 
ATTN:  Wolfpng  Wildgrube 
Streilkrsef learnt 
D5300  Bonn  2 
4000  Brandywine  Street,  NW 
Washington.  DC  20016 


Dr.  Brute  Williams 
Department  of  Educational 
Psychology 
University  of  Illinois 
Urbans,  1L  61801 

Dr.  Hilda  Wng 

Federal  Aviation  Administration 
800  Independence  Ave,  SW 
Washington,  DC  20591 


Mr.  John  R  Wolfe 
Navy  Personnel  RAD  Center 
San  Diego.  CA  92152-6800 

Dr.  George  Wong 
BiosUtistics  Laboratory 
Memorial  Sloan-Kettering 
Cancer  Center 
1275  York  Avenue 
New  Yott,  NY  10021 

Dr.  Wallace  Wulfeck.  Ill 
Navy  Personnel  RAD  Center 
Code  51 

San  Diego,  CA  92152-6800 

Dr.  Kentaro  Yamamoto 
02-T 

Educational  Testing  Service 
Roaedale  Road 
Princeton.  NJ  08511 


Dr.  Wendy  Yen 
CIB/McGraw  HO! 

Del  Monte  Research  Part 
Monterey,  CA  93940 

Dr.  Joseph  L.  Young 
National  Science  Foundation 
Room  320 
1800  G  Street,  N.W. 
Washington,  DC  20550 

Mr.  Anthony  R.  Zara 
National  Council  of  State 
Boards  of  Nursing  Inc. 
625  North  Michigan  Avenue 
Suite  1544 
Chicago,  IL  60611 


